{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Notebook 4.2: Baichuan-13B"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.2.1 Overview\n",
    "\n",
    "This notebook shows how to run [Baichuan-13B](https://github.com/baichuan-inc/Baichuan-13B) Chinese inference on low-cost PCs (without the need of discrete GPU) using [BigDL-LLM](https://github.com/intel-analytics/BigDL/tree/main/python/llm) APIs. Baichuan-13B is an open-source, commercially available large-scale language model developed by Baichuan Intelligent Technology following [Baichuan-7B](https://github.com/baichuan-inc/baichuan-7B). Baichuan-13B also can be found in [Huggingface models](https://huggingface.co/models) in following [link](https://huggingface.co/baichuan-inc/Baichuan-13B-Chat)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.2.2 Installation\n",
    "\n",
    "First of all, install BigDL-LLM in your prepared environment. For best practices of environment setup, refer to [Chapter 2](../ch_2_Environment_Setup/README.md) in this tutorial."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install --pre --upgrade bigdl-llm[all]\n",
    "\n",
    "# Additional package required for Baichuan-13B-Chat to conduct generation\n",
    "!pip install -U transformers_stream_generator"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The all option is for installing other required packages by BigDL-LLM."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.2.3 Load Model and Tokenizer\n",
    "\n",
    "### 4.2.3.1 Load Model\n",
    "\n",
    "Load Baichuan model with low-bit optimization(INT4) for lower resource cost using BigDL-LLM APIs, which convert the relevant layers in the model into INT4 format. \n",
    "\n",
    "> **Note**\n",
    ">\n",
    "> You can specify the argument `model_path` with both Huggingface repo id or local model path."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from bigdl.llm.transformers import AutoModelForCausalLM\n",
    "\n",
    "model_path = \"baichuan-inc/Baichuan-13B-Chat\"\n",
    "model = AutoModelForCausalLM.from_pretrained(model_path,\n",
    "                                             load_in_4bit=True,\n",
    "                                             trust_remote_code=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.2.3.2 Load Tokenizer\n",
    "\n",
    "A tokenizer is also needed for LLM inference. It is used to encode input texts to tensors to feed to LLMs, and decode the LLM output tensors to texts. You can use [Huggingface transformers](https://huggingface.co/docs/transformers/index) API to load the tokenizer directly. It can be used seamlessly with models loaded by BigDL-LLM."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "from transformers import AutoTokenizer\n",
    "tokenizer = AutoTokenizer.from_pretrained(model_path,\n",
    "                                          trust_remote_code=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4.2.4 Inference"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.2.4.1 Create Prompt Template\n",
    "\n",
    "Before generating, you need to create a prompt template, we show an example of a template for question and answering here. You can tune the prompt based on your own model as well."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "BAICHUAN_PROMPT_FORMAT = \"<human>{prompt} <bot>\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4.2.4.2 Generate\n",
    "\n",
    "Then, you can generate output with loaded model and tokenizer.\n",
    "\n",
    "> **Note**\n",
    ">\n",
    "> `max_new_tokens` parameter in the `generate` function defines the maximum number of tokens to predict."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "-------------------- Output --------------------\n",
      "<human>AI是什么？ <bot>人工智能(Artificial Intelligence，简称AI)是指由人制造出来的系统所表现出的智能，通常是通过计算机程序和传感器实现的\n"
     ]
    }
   ],
   "source": [
    "import torch\n",
    "\n",
    "prompt = \"AI是什么？\"\n",
    "n_predict = 32\n",
    "with torch.inference_mode():\n",
    "        prompt = BAICHUAN_PROMPT_FORMAT.format(prompt=prompt)\n",
    "        input_ids = tokenizer.encode(prompt, return_tensors=\"pt\")\n",
    "        # if your selected model is capable of utilizing previous key/value attentions\n",
    "        # to enhance decoding speed, but has `\"use_cache\": false` in its model config,\n",
    "        # it is important to set `use_cache=True` explicitly in the `generate` function\n",
    "        # to obtain optimal performance with BigDL-LLM INT4 optimizations\n",
    "        output = model.generate(input_ids,\n",
    "                                max_new_tokens=n_predict)\n",
    "        output_str = tokenizer.decode(output[0], skip_special_tokens=True)\n",
    "        print('-'*20, 'Output', '-'*20)\n",
    "        print(output_str)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llm-zcg",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.18"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
